Non-Rationalised Geography NCERT Notes, Solutions and Extra Q & A (Class 6th to 12th)
6th				7th				8th				9th				10th				11th				12th

Class 12th Chapters
Fundamentals of Human Geography
1. Human Geography Nature And Scope	2. The World Population Distribution, Density And Growth	3. Population Composition
4. Human Development	5. Primary Activities	6. Secondary Activities
7. Tertiary And Quaternary Activities	8. Transport And Communication	9. International Trade
10. Human Settlements
India - People and Economy
1. Population : Distribution, Density, Growth And Composition	2. Migration : Types, Causes And Consequences	3. Human Development
4. Human Settlements	5. Land Resources And Agriculture	6. Water Resources
7. Mineral And Energy Resources	8. Manufacturing Industries	9. Planning And Sustainable Development In Indian Context
10. Transport And Communication	11. International Trade	12. Geographical Perspective On Selected Issues And Problems
Practical Work in Geography
1. Data – Its Source And Compilation	2. Data Processing	3. Graphical Representation Of Data
4. Use Of Computer In Data Processing And Mapping	5. Field Surveys	6. Spatial Information Technology

Chapter 2 Data Processing

Organizing and presenting raw data is the initial step in making it comprehensible and ready for analysis. Various statistical techniques are then used to extract meaningful insights from the data. This chapter introduces key statistical techniques for data analysis in geography.

These techniques are broadly categorized into three types:

1. Measures of Central Tendency

2. Measures of Dispersion

3. Measures of Relationship

Measures of Central Tendency provide a single value that represents the typical or central value of a dataset.

Measures of Dispersion describe how spread out or varied the data points are, often in relation to the central value.

Measures of Relationship (like correlation) quantify the degree of association or interdependence between two or more variables.

Measures Of Central Tendency

Geographical characteristics such as rainfall amounts, elevation, population density, educational attainment levels, or age groups all show variation. To understand these variations collectively, we often seek a single representative value that best summarizes the entire set of observations.

This representative value usually lies near the centre of the data distribution. Statistical methods used to find this central point are called measures of central tendency, also known as statistical averages.

The most common measures of central tendency are the Mean, Median, and Mode. Each provides a different way of identifying a central representative value and is suited to different types of data.

Mean

The mean is the arithmetic average of a dataset. It is calculated by summing all the values in the dataset and dividing the sum by the total number of observations. The method of calculating the mean differs slightly for ungrouped and grouped data, and can be done using either direct or indirect methods.

Computing Mean from Ungrouped Data:

Direct Method: Sum all individual values ($\sum x$) and divide by the number of observations (N).
$ \text{Mean} (\bar{X}) = \frac{\sum x}{N} $

Example 2.1: Calculate the mean rainfall for Malwa Plateau districts from the rainfall data given.

Answer:

Rainfall data (x) for 7 districts: 979, 1083, 833, 896, 891, 825, 977 mm.

Sum of rainfall ($\sum x$) = $979 + 1083 + 833 + 896 + 891 + 825 + 977 = 6484$ mm.

Number of districts (N) = 7.

$ \bar{X} = \frac{6484}{7} = 926.29 \text{ mm} $

Indirect Method: Used for larger datasets to simplify calculations. An 'assumed mean' (A) is chosen (ideally close to the actual mean). Deviations (d) of each observation from the assumed mean ($d = x - A$) are calculated. The mean is then calculated using the formula:

$ \text{Mean} (\bar{X}) = A + \frac{\sum d}{N} $

Example 2.1 (continued): Calculate the mean rainfall using an assumed mean of 800 mm.

Answer:

Assumed Mean (A) = 800.

Deviations (d = x - 800): 179, 283, 33, 96, 91, 25, 177.

Sum of deviations ($\sum d$) = $179 + 283 + 33 + 96 + 91 + 25 + 177 = 884$.

Number of districts (N) = 7.

$ \bar{X} = 800 + \frac{884}{7} = 800 + 126.29 = 926.29 \text{ mm} $

Districts in Malwa Plateau	Normal Rainfall (x) in mms	Deviation (d = x - 800)
Indore	979	179
Dewas	1083	283
Dhar	833	33
Ratlam	896	96
Ujjain	891	91
Mandsaur	825	25
Shajapur	977	177
$\sum x$ and $\sum d$	6484	884
$\bar{X} = \sum x / N$ and $\sum d / N$	926.29	126.29

The mean calculated by both methods is the same.

Computing Mean from Grouped Data:

Direct Method: When data is grouped into classes with frequencies, the midpoint (x) of each class represents the values in that class. Calculate the product of the midpoint and frequency (fx) for each class. Sum all these products ($\sum fx$) and divide by the total number of observations (N, which is the sum of frequencies, $\sum f$).
$ \text{Mean} (\bar{X}) = \frac{\sum fx}{N} $

Example 2.2: Compute the average wage rate of factory workers using the given data (Table 2.2).

Wage Rate (Rs./day) Classes Number of workers (f)

50 - 70 10

70 - 90 20

90 - 110 25

110 - 130 35

130 - 150 9

Answer:

Calculate midpoints (x) for each class and the product (fx).

Classes Frequency (f) Midpoints (x) fx

50-70 10 60 600

70-90 20 80 1600

90-110 25 100 2500

110-130 35 120 4200

130-150 9 140 1260

Total N = $\sum f = 99$ $\sum fx = 10160$

$ \bar{X} = \frac{10160}{99} = 102.6 $ Rs./day

Wage Rate (Rs./day) Classes	Number of workers (f)
50 - 70	10
70 - 90	20
90 - 110	25
110 - 130	35
130 - 150	9

Classes	Frequency (f)	Midpoints (x)	fx
50-70	10	60	600
70-90	20	80	1600
90-110	25	100	2500
110-130	35	120	4200
130-150	9	140	1260
Total	N = $\sum f = 99$		$\sum fx = 10160$

Indirect Method (Short-cut Method): An assumed mean (A) is chosen (often the midpoint of the class with the highest frequency or near the middle of the range). Calculate the deviation (d) of the midpoint of each class from the assumed mean ($d = x - A$). Multiply each deviation by its frequency (fd). Sum these products ($\sum fd$).

$ \bar{X} = A \pm \frac{\sum fd}{N} $

Alternatively, a simplified deviation (u) can be used, dividing d by the class interval (i): $u = d/i$. Then multiply u by frequency (fu), sum these products ($\sum fu$).

$ \bar{X} = A \pm \frac{\sum fu}{N} \times i $

Example 2.2 (continued): Compute the average wage rate using the indirect method, with an assumed mean of 100 (midpoint of 90-110 class) and interval of 20.

Answer:

Assumed Mean (A) = 100, Interval (i) = 20.

Classes	Frequency (f)	Midpoints (x)	Deviation (d = x - 100)	fd	Simplified Deviation (u = d/20)	fu
50-70	10	60	-40	-400	-2	-20
70-90	20	80	-20	-400	-1	-20
90-110	25	100	0	0	0	0
110-130	35	120	20	700	1	35
130-150	9	140	40	360	2	18
Total	N = $\sum f = 99$			$\sum fd = 260$		$\sum fu = 13$

Using $\sum fd$:

$ \bar{X} = 100 + \frac{260}{99} = 100 + 2.63 = 102.63 $ Rs./day (slight difference due to rounding)

Using $\sum fu$:

$ \bar{X} = 100 + \frac{13}{99} \times 20 = 100 + 0.1313 \times 20 = 100 + 2.63 = 102.63 $ Rs./day

Median

The median is a positional average. It is the value that divides a dataset, when arranged in ascending or descending order, into two equal halves. It is not affected by the actual values of extreme observations, only by their position.

Computing Median for Ungrouped Data:

Arrange the data in ascending or descending order. The median is the value of the middle observation. The position of the median is found using the formula:

$ \text{Position of Median} = \left(\frac{N+1}{2}\right)^{\text{th}} \text{ item} $

Where N is the number of observations.

If N is odd, the median is the value at this position. If N is even, the median is the average of the values at the two middle positions (N/2 and (N/2)+1).

Example 2.3: Calculate median height for the given mountain peaks: 8,126 m, 8,611m, 7,817 m, 8,172 m, 8,076 m, 8,848 m, 8,598 m.

Answer:

Arrange in ascending order: 7,817; 8,076; 8,126; 8,172; 8,598; 8,611; 8,848.

N = 7.

Position of Median = $(7+1)/2 = 4^{\text{th}}$ item.

The 4th item in the arranged series is 8,172 m.

$ \text{Median} (M) = 8,172 \text{ m} $

Computing Median for Grouped Data:

For grouped data, the median is calculated using the cumulative frequency distribution to find the class where the median lies (the median class). The formula is:

$ M = l + \frac{\frac{N}{2} - c}{f} \times i $

Where:

M = Median
l = Lower limit of the median class
N = Total frequency ($\sum f$)
c = Cumulative frequency of the class *preceding* the median class
f = Frequency of the median class
i = Class interval width

Example 2.4: Calculate the median for the following frequency distribution:

Class	f
50-60	3
60-70	7
70-80	11
80-90	16
90-100	8
100-110	5

Answer:

Calculate cumulative frequencies (F) and find the median position (N/2).

Class	Frequency (f)	Cumulative Frequency (F)	Calculation of Median Class
50-60	3	3
60-70	7	10
70-80	11	21 (c)
80-90	16 (f)	37	Median group (N/2 = 25 is here)
90-100	8	45
100-110	5	50
Total	N = $\sum f = 50$

$ N/2 = 50/2 = 25 $. The cumulative frequency next greater than 25 is 37, which falls in the 80-90 class. So, the median class is 80-90.

l = 80, N = 50, c = 21 (cumulative frequency of the class before 80-90), f = 16 (frequency of 80-90 class), i = 10 (class interval width).

$ M = 80 + \frac{25 - 21}{16} \times 10 = 80 + \frac{4}{16} \times 10 = 80 + \frac{1}{4} \times 10 = 80 + 2.5 = 82.5 $

Mode

The mode is the value that appears most frequently in a dataset. It is represented by Z or M0. The mode is generally less used than the mean or median.

Computing Mode for Ungrouped Data:

For ungrouped data, arrange the measures in ascending or descending order and simply count the frequency of each value to identify the one that occurs most often.

Example 2.5: Calculate mode for test scores: 61, 10, 88, 37, 61, 72, 55, 61, 46, 22.

Answer:

Arrange in ascending order: 10, 22, 37, 46, 55, 61, 61, 61, 72, 88.

The score 61 occurs 3 times, more than any other score.

$ \text{Mode} (Z) = 61 $ (Unimodal - one mode)

Example 2.6: Calculate mode for test scores: 82, 11, 57, 82, 08, 11, 82, 95, 41, 11.

Answer:

Arrange in ascending order: 08, 11, 11, 11, 41, 57, 82, 82, 82, 95.

The scores 11 and 82 both occur 3 times, which is the highest frequency.

$ \text{Mode} (Z) = 11 \text{ and } 82 $ (Bimodal - two modes)

If three values have the same highest frequency, the distribution is trimodal. If many values have the same highest frequency, it's multimodal. If no value is repeated, there is no mode.

Comparison Of Mean, Median And Mode

The relationship between mean, median, and mode can be visualized using a frequency distribution curve.

In a normal distribution, the frequency distribution is symmetrical and bell-shaped. In a perfect normal distribution, the mean, median, and mode all coincide and are located at the peak of the curve, representing the central value with the highest frequency.

Normal Distribution Curve with Mean, Median, Mode at the center

However, if the data distribution is not symmetrical but skewed (pushed towards one end), the mean, median, and mode will not coincide.

Positive Skew (Right Skew): The tail of the distribution extends towards higher values. The mode is at the peak, the median is to the right of the mode, and the mean is further to the right (pulled by the higher values). Mode < Median < Mean.

Positively Skewed Distribution Curve with Mean, Median, Mode positions shown

Negative Skew (Left Skew): The tail extends towards lower values. The mode is at the peak, the median is to the left of the mode, and the mean is further to the left (pulled by the lower values). Mean < Median < Mode.

Negatively Skewed Distribution Curve with Mean, Median, Mode positions shown

The choice of which measure of central tendency to use depends on the data type and distribution. The mean is sensitive to extreme values. The median is less affected by extreme values and is suitable for skewed distributions. The mode is useful for categorical data or identifying the most common value, but can be unstable and may not exist or be unique.

Measures Of Dispersion

Measures of central tendency alone do not fully describe a dataset. They tell us the centre but not how the data points are spread out around that centre. Dispersion (or variability) refers to the scattering or spread of scores or measurements within a distribution.

Using measures of dispersion alongside central tendency provides a better understanding of the distribution's characteristics, such as its homogeneity or variability.

Dispersion serves two main purposes: understanding the composition of a distribution and comparing the stability or homogeneity of different distributions.

Common methods for measuring dispersion are:

Range
Quartile Deviation
Mean Deviation
Standard Deviation
Coefficient of Variation
Lorenz Curve

The Range, Standard Deviation (as an absolute measure), and Coefficient of Variation (as a relative measure) are widely used. Quartile Deviation and Mean Deviation are less common.

Range

The range (R) is the simplest measure of dispersion, calculated as the difference between the highest (L) and lowest (S) values in a dataset.

$ R = L - S $

Example 2.7: Calculate the range for daily wages: Rs. 40, 42, 45, 48, 50, 52, 55, 58, 60, 100.

Answer:

Highest value (L) = 100, Lowest value (S) = 40.

$ R = 100 - 40 = 60 $

The range is highly influenced by extreme values and is considered an unstable measure of dispersion, similar to how the mode is an unstable measure of central tendency.

Standard Deviation

The standard deviation (SD) is the most common and stable measure of dispersion. It is calculated around the mean and represents the typical distance of data points from the mean. It is defined as the square root of the variance.

The Greek letter $\sigma$ (sigma) often denotes Standard Deviation for a population, while 's' or SD is used for a sample.

The formula for Standard Deviation for ungrouped data is:

$ s = \sqrt{\frac{\sum x^2}{N}} $

Where $x$ is the deviation of each score from the mean ($x = X - \bar{X}$) and $x^2$ is the squared deviation.

The term $\frac{\sum x^2}{N}$ before taking the square root is called the variance ($s^2$). Standard deviation is the square root of variance, and variance is the square of standard deviation.

Computing Standard Deviation for Ungrouped Data:

Example 2.8: Calculate the standard deviation for scores: 01, 03, 05, 07, 09.

Answer:

First, calculate the mean ($\bar{X}$).

$ \bar{X} = (1+3+5+7+9)/5 = 25/5 = 5 $

Calculate deviations from the mean (x) and squared deviations (x$^2$).

X (Score)	$x = X - \bar{X}$ (Deviation from Mean)	$x^2$ (Squared Deviation)
1	$1 - 5 = -4$	$(-4)^2 = 16$
3	$3 - 5 = -2$	$(-2)^2 = 4$
5	$5 - 5 = 0$	$(0)^2 = 0$
7	$7 - 5 = 2$	$(2)^2 = 4$
9	$9 - 5 = 4$	$(4)^2 = 16$
$\sum X = 25$	$\sum x = 0$ (Check: sum of deviations is zero)	$\sum x^2 = 40$

N = 5.

$ s = \sqrt{\frac{\sum x^2}{N}} = \sqrt{\frac{40}{5}} = \sqrt{8} \approx 2.83 $

Computing Standard Deviation for Grouped Data:

For grouped data, a simplified calculation method similar to the indirect method for mean is often used.

$ s = i \times \sqrt{\frac{\sum fu^2}{N} - \left(\frac{\sum fu}{N}\right)^2} $

Where:

s = Standard Deviation
i = Class interval width
f = Frequency of each class
u = Simplified deviation of the midpoint of each class from the assumed mean (u = (midpoint - Assumed Mean) / i)
$fu^2$ = product of frequency and squared simplified deviation
N = Total frequency ($\sum f$)
$\sum fu$ = Sum of (frequency * simplified deviation)
$\sum fu^2$ = Sum of (frequency * squared simplified deviation)

Example: Calculate the standard deviation for the following distribution:

Groups	f
120-130	2
130-140	4
140-150	6
150-160	12
160-170	10
170-180	6

Answer:

Calculate midpoints, choose an assumed mean, calculate simplified deviations (u), fu, and fu$^2$. Assumed mean (A) = 155 (midpoint of 150-160), interval (i) = 10.

Group	f	Midpoint (x)	$u = (x - 155) / 10$	fu	$u^2$	$fu^2$
120 - 130	2	125	-3	-6	9	18
130 - 140	4	135	-2	-8	4	16
140 - 150	6	145	-1	-6	1	6
150 - 160	12	155	0	0	0	0
160 - 170	10	165	1	10	1	10
170 - 180	6	175	2	12	4	24
Total	N = 40			$\sum fu = 2$		$\sum fu^2 = 74$

$ s = 10 \times \sqrt{\frac{74}{40} - \left(\frac{2}{40}\right)^2} = 10 \times \sqrt{1.85 - (0.05)^2} = 10 \times \sqrt{1.85 - 0.0025} = 10 \times \sqrt{1.8475} \approx 10 \times 1.359 = 13.59 $

Coefficient Of Variation (CV)

The Coefficient of Variation (CV) is a relative measure of dispersion. It is particularly useful for comparing the variability of datasets that are expressed in different units of measurement or have vastly different means. CV expresses the standard deviation as a percentage of the mean.

$ \text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100 $

$ \text{CV} = \frac{s}{\bar{X}} \times 100 $

Example: Calculate the CV for the dataset in Example 2.8 ($\bar{X} = 5$, $s \approx 2.83$).

Answer:

$ \text{CV} = \frac{2.83}{5} \times 100 = 0.566 \times 100 = 56.6\% $

A higher CV indicates greater relative variability or dispersion compared to the mean.

Measures Of Relationship

Measures of relationship explore the association or interdependence between two or more variables. When changes in one variable are associated with changes in another, we say they are related or correlated. Correlation is a measure of this relationship.

Correlation describes both the nature (direction) and strength (degree) of the relationship between variables.

Direction Of Correlation

The direction of correlation indicates whether variables change together in the same direction or opposite directions.

Positive Correlation: Variables change in the same direction (as one increases, the other increases; as one decreases, the other decreases). Example: fertilizer consumption and crop yield (often positively correlated).
Negative Correlation: Variables change in opposite directions (as one increases, the other decreases). Example: altitude and air pressure (negatively correlated).
No Correlation (Zero Correlation): Changes in one variable do not correspond to any consistent change in the other variable.

A scatter plot visually shows the relationship between two variables. In a scatter plot, if points tend to rise from lower left to upper right, it indicates positive correlation. If points tend to fall from upper left to lower right, it indicates negative correlation. If points are scattered randomly with no clear pattern, it indicates no correlation.

Scatter plot showing a perfect positive linear relationship

Scatter plot showing a perfect negative linear relationship

Scatter plot showing no clear linear relationship between variables

Degree Of Correlation

The degree or strength of correlation measures how closely the two variables are related. It is expressed numerically, typically ranging from -1 to +1.

The correlation coefficient falls within the range of -1.00 to +1.00. It can never exceed 1 in either direction.

Perfect Positive Correlation (+1.00): All data points fall exactly on a straight line that slopes upwards from left to right. There is a perfect, direct relationship.
Perfect Negative Correlation (-1.00): All data points fall exactly on a straight line that slopes downwards from left to right. There is a perfect, inverse relationship.
Zero Correlation (0.00): There is no linear relationship between the variables; data points are scattered randomly.

Correlations between 0 and $\pm 1$ indicate varying degrees of relationship:

Diagram showing the range of correlation coefficients from -1 to +1

Weak Correlation: Data points are widely scattered around the trend line. The relationship is not strong. (e.g., correlation coefficient closer to 0, like $\pm 0.1$ to $\pm 0.3$).

Scatter plot showing weak negative correlation

Moderate Correlation: Data points show some clustering around a trend line, but with noticeable scatter. (e.g., correlation coefficient around $\pm 0.4$ to $\pm 0.6$).

Scatter plot showing moderate positive correlation

Strong Correlation: Data points cluster closely around a trend line, indicating a strong relationship. (e.g., correlation coefficient around $\pm 0.7$ to $\pm 0.9$).

Scatter plot showing strong positive correlation

Spearman’s Rank Correlation

Spearman's Rank Correlation, denoted by $r_s$ or $\rho$ (rho), is a non-parametric method used to measure the degree of association between two variables based on their ranks rather than their raw values. It is particularly useful when data is ordinal or when the number of observations is small.

The formula for Spearman's Rank Correlation is:

$ r_s = 1 - \frac{6 \sum D^2}{N(N^2 - 1)} $

Where:

$r_s$ = Spearman's Rank Correlation coefficient
$\sum D^2$ = Sum of the squares of the differences between the ranks of corresponding pairs of X and Y variables
N = Number of pairs of observations (number of items)

Steps for Calculation:

Example 2.9: Calculate Spearman’s Rank Correlation for the given scores in Economics (X) and Geography (Y).

Economics (X)	Geography (Y)
02	04
08	12
00	06
20	24
12	16
16	18
06	08
18	20
09	09
10	10

Answer:

Follow these steps to compute the rank correlation:

X (Score)	Y (Score)	XR (Rank of X)	YR (Rank of Y)	D (Difference in Ranks $\|XR - YR\|$)	D$^2$
2	4	9	10	$\|9 - 10\| = 1$	1
8	12	7	5	$\|7 - 5\| = 2$	4
0	6	10	9	$\|10 - 9\| = 1$	1
20	24	1	1	$\|1 - 1\| = 0$	0
12	16	4	4	$\|4 - 4\| = 0$	0
16	18	3	3	$\|3 - 3\| = 0$	0
6	8	8	8	$\|8 - 8\| = 0$	0
18	20	2	2	$\|2 - 2\| = 0$	0
9	9	6	7	$\|6 - 7\| = 1$	1
10	10	5	6	$\|5 - 6\| = 1$	1
				N = 10	$\sum D^2 = 8$

Apply the formula:

$ r_s = 1 - \frac{6 \sum D^2}{N(N^2 - 1)} = 1 - \frac{6 \times 8}{10(10^2 - 1)} = 1 - \frac{48}{10(100 - 1)} = 1 - \frac{48}{10(99)} = 1 - \frac{48}{990} $

$ r_s = 1 - 0.04848... \approx 1 - 0.05 = 0.95 $

The rank correlation coefficient is approximately 0.95, indicating a very strong positive correlation between the scores in Economics and Geography for this group of students.

Rank correlation is a good alternative when the number of cases is small. For larger datasets, calculating ranks can become cumbersome, and other correlation methods might be more efficient.

Excercises

This section contains exercises covering the calculation and interpretation of measures of central tendency, dispersion, and correlation, allowing students to practice and apply the statistical techniques learned in the chapter.